This code will do exploratory analysis between the variables related to trust and other demographic factors:
The workflow is: - Call summarised dataset - Analyse data availability and correlations to select key variables. - Select final demographic variables: - Clean these variables for modelling.
The focus of the paper are trust in European Parliament and the relationship to trust in local Parliament. So these 2 variables are presented for multivariate analysis. The others are not included.
The following will call the data file from “Univariate analysis.Rmd” file. This also calls the “Dataframe creation_ESS.Rmd” file within it so takes a while. The main useful outputs are:
## corrplot 0.94 loaded
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
Use GB only to check for variables to remove. This is our focus anyway.
We also look to focus on variables after R5 as this will be our focus for the modelling.
Check sociodemographic variables:
gb_data_cleaning <- ess_data |>
filter(cntry == "GB",
essround > 5)
# check overall missing
gb_data_cleaning |> plot_missing()# select all variables with over 50% missing values
missing50plus <- gb_data_cleaning |>
select(where(~ mean(is.na(.)) * 100 > 50)) |>
names()
print(missing50plus)## [1] "psu" "stratum" "feethngr" "prtvtbgb" "prtclbgb" "prtdgcl"
# make sure we don't remove weights
missing50plus <- missing50plus[!(missing50plus %in% c("psu", "stratum"))]
## remove these values from main and cleaning data:
gb_data_cleaning <- gb_data_cleaning |>
select(-missing50plus)
ess_data <- ess_data |>
select(-missing50plus)We have lost mostly variables about political party alignment (these were only asked in R7 and R8) + whether the individual feels aligned to the main ethnic group in the country. We have comparable political and ethnicity vairables that will be fine.
Check for more detailed analysis for the DiD. Remove variables with high missing values in our key waves.
## check for missing in key rounds of data with DiD
gb_data_cleaning |>
select(-psu, -stratum) |>
filter(essround %in% c(6,7,8)) |>
plot_missing()gb_data_cleaning |>
filter(essround %in% c(6,7,8)) |>
select(where(~ mean(is.na(.)) * 100 > 50)) |>
names()## [1] "psu" "stratum" "netusoft" "nwspol" "pstplonl" "atchctr" "atcherp"
We do not remove these variables as many are available from R8. So they will not be included in the DiD but may be useful for the regression and identifying trends.
## check overall response rates
ess_data |>
filter(cntry == "GB") |>
select(any_of(
c("vote", # Voted last national election (Yes/No)
"prtvtbgb", # Party voted for in last national election
"prtclbgb", # Which party feel closer to, United Kingdom
"prtdgcl", # How close does the repondent feel to the party party from 'prtclbgb'
"lrscale", # left right political scale
"polintr" # level of political interest
))) |>
plot_missing()We retain political interest, voted in last election and left right scale.
# specify our remaining political variables for calling later:
polit_vars <- c("vote", "lrscale", "polintr")Check multicollinearity
We see these are all valid. Voting is more correlated with voting.
## Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
## # Was:
## data %>% select(polit_vars)
##
## # Now:
## data %>% select(all_of(polit_vars))
##
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
corrplot(cor_matrix, method = "color", type = "upper",
addCoef.col = "black", tl.col = "black",
col = colorRampPalette(c("blue", "white", "red"))(200))# identify vars
inc_edu_vars <- c(
"eisced", # Highest level of education of respondent
"pdwrk", # In paid work
"hinctnta", # Household's total net income, all sources (reported in deciles)
"hincfel") # Feeling about household's income nowadays - financial stress
# plot correlations:
cor_matrix <- cor(gb_data_cleaning |>
select(inc_edu_vars),
use = "pairwise.complete.obs")## Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
## # Was:
## data %>% select(inc_edu_vars)
##
## # Now:
## data %>% select(all_of(inc_edu_vars))
##
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
corrplot(cor_matrix, method = "color", type = "upper",
addCoef.col = "black", tl.col = "black",
col = colorRampPalette(c("blue", "white", "red"))(200))We decide to keep one education variable and income + income stress. Financial gives us a more broad relative measure of financial ability. We will regroup these variables later and clean. We see that education-eisced has missing values.
Set the variables, plot missing and distribution.
satis_vars <- c(
# satisfaction with life and country
"stflife", # How satisfied with life as a whole
"stfdem", # How satisfied with the way democracy works in country
"stfeco", # How satisfied with present state of economy in country
"stfgov", # How satisfied with the national government
"stfedu", # State of education in country nowadays
"stfhlth" # State of health services in country nowadays
)
cor_matrix <- cor(gb_data_cleaning |>
select(satis_vars),
use = "pairwise.complete.obs")## Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
## # Was:
## data %>% select(satis_vars)
##
## # Now:
## data %>% select(all_of(satis_vars))
##
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
corrplot(cor_matrix, method = "color", type = "upper",
addCoef.col = "black", tl.col = "black",
col = colorRampPalette(c("blue", "white", "red"))(200))
All variables have some validaity but are highly correlated. We use the
satisfaction life and satisfaction with the economy variables as the
literature indicates these were related to Brexit vote.
immig_vars <- c(
"imsmetn", #Allow many/few immigrants of same race/ethnic group as majority
"imdfetn", #Allow many/few immigrants of different race/ethnic group from majority
"impcntr", #Allow many/few immigrants from poorer countries outside Europe
"imbgeco", #Immigration bad or good for country's economy
"imueclt", #Country's cultural life undermined or enriched by immigrants
"imwbcnt") #Immigrants make country worse or better place to live
cor_matrix <- cor(gb_data_cleaning |>
filter(cntry == "GB") |>
select(immig_vars),
use = "pairwise.complete.obs")## Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
## # Was:
## data %>% select(immig_vars)
##
## # Now:
## data %>% select(all_of(immig_vars))
##
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
corrplot(cor_matrix, method = "color", type = "upper",
addCoef.col = "black", tl.col = "black",
col = colorRampPalette(c("blue", "white", "red"))(200))# plot just overall attitudes. We see they have increased overall (unweighted data)
ess_data |>
group_by(essround) |>
summarise(avg_immig_att = mean(imwbcnt, na.rm = TRUE)) |>
ggplot()+
geom_line(aes(x=essround, y=avg_immig_att))Quite high correlations, we see that there are many 5/10 responses here. Large modal value. This may be people not wanting to appear xenophobic and hiding their true views.
We will continue with “imwbcnt” - Immigrants make country worse or better place to live. We will group this as yes, no, moderate but not focus on extremes. This will be hard to exploit as many people appear to be hiding their true value.
## [1] "essround" "cntry"
## [3] "country_name" "anweight"
## [5] "pspwght" "pweight"
## [7] "psu" "stratum"
## [9] "trust_people" "trust_europ"
## [11] "trust_legal" "trust_police"
## [13] "trust_politicians" "trust_parliament"
## [15] "trust_polparties" "trust_un"
## [17] "gndr" "agea"
## [19] "hhmmb" "eisced"
## [21] "pdwrk" "hinctnta"
## [23] "hincfel" "ctzcntr"
## [25] "brncntr" "blgetmg"
## [27] "region" "domicil"
## [29] "happy" "health"
## [31] "hlthhmp" "vote"
## [33] "lrscale" "polintr"
## [35] "clsprty" "euftf"
## [37] "stflife" "stfdem"
## [39] "stfeco" "stfgov"
## [41] "stfedu" "stfhlth"
## [43] "netusoft" "nwspol"
## [45] "pstplonl" "imsmetn"
## [47] "imdfetn" "impcntr"
## [49] "imbgeco" "imueclt"
## [51] "imwbcnt" "atchctr"
## [53] "atcherp" "trust_people_hilow"
## [55] "trust_europ_hilow" "trust_legal_hilow"
## [57] "trust_police_hilow" "trust_politicians_hilow"
## [59] "trust_parliament_hilow" "trust_polparties_hilow"
## [61] "trust_un_hilow" "trust_people_extreme"
## [63] "trust_europ_extreme" "trust_legal_extreme"
## [65] "trust_police_extreme" "trust_politicians_extreme"
## [67] "trust_parliament_extreme" "trust_polparties_extreme"
## [69] "trust_un_extreme"
# choose variables for demographics:
demog_vars <- c(
## ID and weights
"essround",
"cntry",
"anweight",
# trust variables (local and EU parliament only)
"trust_europ",
"trust_parliament",
# some demographic variables
"gndr", # Gender
"agea", # Calculated age of respondent
"hlthhmp", # Hampered in daily activities by illness/disability/infirmity/mental problem
"hhmmb", # number of hh members
# income + education
"eisced", # Highest level of education of respondent
"hinctnta", # Household's total net income, all sources (reported in deciles)
"hincfel", # Feeling about household's income nowadays - financial stress
# citizenship/ethnicity
"blgetmg", # Belong to minority ethnic group in country
# location
"domicil", # type of area 5pt scale - A big city, Suburbs or outskirts, town or small city, village, countryside
"region", # region (not for model but maybe if maps made.)
# health
"hlthhmp", # Hampered in daily activities by illness/disability/infirmity/mental problem
# political identifiers
"lrscale", # left right political scale
"polintr", # level of political interest
# satisfaction with life and country
"stflife", # How satisfied with life as a whole
"stfeco", # How satisfied with present state of economy in country
# immigration
"imwbcnt", #Immigrants make country worse or better place to live
## variables only available from R8 - for use in regression modelling.
"netusoft", # internet use - R8 onwards only
"nwspol", # Newspaper reading, politics/current affairs on average weekday - R8 onwards
"pstplonl", # Posted or shared anything about politics online last 12 months - R8 on
"atchctr" # How emotionally attached to [country] - R8 onwards
)# now limit to only variables of interest
ess_clean <- ess_data |>
filter(essround >5) |>
select(any_of(demog_vars))Check distributions
# for misc variables
ess_clean |>
filter(cntry == "GB") |>
select(hhmmb, hinctnta, netusoft, atchctr, pstplonl, imwbcnt, lrscale) |>
plot_histogram()##
## Yes a lot Yes to some extent No Refusal
## 13166 43412 143574 0
## Don't know No answer
## 0 0
##
## Not at all emotionally attached 1
## 1547 855
## 2 3
## 1927 3004
## 4 5
## 3215 9198
## 6 7
## 8798 17673
## 8 9
## 28379 20162
## Very emotionally attached Refusal
## 34517 0
## Don't know No answer
## 0 0
##
## 0 1 2 3 4 5 6 7 8 9 10
## 1547 855 1927 3004 3215 9198 8798 17673 28379 20162 34517
##
## Yes No Refusal Don't know No answer
## 21869 107450 0 0 0
We will drop household size as it’s relatively uninterpretable. The others will be recoded generally for easier interpretation within models.
We see many individuals with extremely high reported news consumption. Some even at over 24 hours a day (max value is 1439 minutes of political news consumption per day). This is clearly an issue. Particularly as we see the low reported political interest for many of these people. From this, we will drop the daily political news comsumption variable and rely on self reported political interest and post online.
# focus on the political news consumption which appears to have some coding issues:
ess_clean |>
filter(cntry == "GB", essround>7) |>
select(nwspol) |>
plot_histogram()## nwspol
## Min. : 0.00
## 1st Qu.: 20.00
## Median : 60.00
## Mean : 92.47
## 3rd Qu.: 120.00
## Max. :1439.00
## NA's :42
agea - to standard age groups eisced - combine to fewer groups. hinctnta - combine income deciles to 3 levels. domicil - to 4pt scale - combine country data hlthhmp - physical or mental disability, combine both yes categories. polintr - binary yes/no (from 4pt scale)
Regroup all the 10pt scale variables to categorical - 3pt 3pt = 0-3, 4-6, 7-10
10pt vars (on a scale of 0-10) lrscale - to 3pt scale-> left (0-3), moderate (4-6) and right (7-10)
R8 onwards varaiables: netuseoft - to daily or not daily (the majority of users are daily internet so converting to binary as it still has class imbalance) nwspol - DROPPED due to data quality issue frmom above. political_news time per day - group to 1hr plus or less atchctr - convert to 3pt categorical post political online - keep as binary but relevel to “yes” and “no’
#function to create high/low variables for each:
convert_10pt_3pt <- function(var_10pt) {
factor(
case_when(
var_10pt < 4 ~ "Low",
between(var_10pt, 4, 6) ~ "Moderate",
between(var_10pt, 7, 10) ~ "High", # use range to avoid any poorly coded 77/88/99 values being included
TRUE ~ NA #account for true NAs from no response
),
levels = c("High","Moderate","Low", NA))
}First, convert all to factor, this will give us the text labels too.
# add on recoded variables that are cleaned.
ess_clean <- ess_clean |>
mutate(
# create age groupings
age_rec = factor(case_when(agea <20 ~ "<20",
between(agea, 20, 29) ~ "20-29",
between(agea, 30, 44) ~ "30-44",
between(agea, 45, 64) ~ "45-64",
agea >= 65 ~ ">=65",
TRUE ~ NA),
levels = c("<20", "20-29", "30-44", "45-64", ">=65")),
# recode as factor for labels but these 3 are unchanged on scales
gender = as_factor(gndr),
ethnic_minority = as_factor(blgetmg),
# create education factor levels
educ_level = factor(case_when(between(eisced, 1, 2) ~ "Lower secondary", # 1 or 2 = low (lower secondary)
between(eisced, 3, 4) ~ "Upper secondary", # 3 or 4 = medium (upper secondary)
between(eisced, 5, 7) ~ "Tertiary", # 5, 6 or 7 = high (vocational/tertiary)
TRUE ~ NA),
levels = c("Lower secondary", "Upper secondary", "Tertiary")),
# self reported income stress (4pt scale)
income_stress = as_factor(hincfel),
# group income at 3pt level. This doesn't have zero value so we create our own groups.
income_group = factor(case_when(between(hinctnta, 1,3) ~ "Low (Decile <4)",
between(hinctnta, 4,7) ~ "Moderate (Decile 4-7)",
between(hinctnta, 8,10) ~ "High (Decile 8+)",
TRUE ~ NA_character_)),
# region type - just group country level data, so we have a 4pt scale.
area_type = fct_collapse(as_factor(domicil),
"Country" = c("Country village", "Farm or home in countryside")),
region = as_factor(region), # for if we map.
# health - physical/mental limitation flag:
health_disability = factor(case_when(between(hlthhmp, 1,2) ~ "Yes",
hlthhmp == 3 ~ "No",
TRUE ~ NA)),
# left right scale - manually so I can name
left_right = factor(case_when(lrscale < 4 ~ "Left (0-3)",
between(lrscale, 4, 6) ~ "Moderate (4-6)",
between(lrscale, 7, 10) ~ "Right (7-10)",
TRUE ~ NA)), #account for true NAs from no response
# convert political interst to binary
polintr_binary = case_when(polintr %in% c(1,2) ~ "Yes",
polintr %in% c(3,4) ~ "No",
TRUE ~ NA_character_),
# convert satisfactions from 10pt to 3pt scale
life_sat = convert_10pt_3pt(stflife),
econ_sat = convert_10pt_3pt(stfeco),
#attitudes to immigration - make country a better place
immig_support = convert_10pt_3pt(imwbcnt),
## R8 onwards variables:
daily_netuse = factor(case_when(as_factor(netusoft) %in% c("Never", "Only occasionally", "A few times a week", "Most days") ~ "No",
as_factor(netusoft) == "Every day" ~ "Yes",
TRUE ~ NA)),
country_attach = convert_10pt_3pt(atchctr),
post_polonline = factor(case_when(pstplonl == 2 ~ "No",
pstplonl == 1 ~ "Yes",
TRUE ~ NA))
) |>
select(-nwspol) ## removed due to data quality issues. Now we remove our additional columns:
ess_clean <- ess_clean |>
select(
essround,
cntry,
anweight,
trust_europ,
trust_parliament,
# demog vars:
gender, # gender - binary
age_rec, # age recoded - 5pt
educ_level, # education recoded - 3pt
income_group, # recoded deciles to 3pt
income_stress, # income stress level - 4pt
ethnic_minority, # minority ethnic group - binary
area_type, # regional description - 4pt
region, # UK 13 region names for mapping if required..
health_disability, # physical/mental disability - binary
left_right, ## recoded - 3pt
polintr_binary, # recoded - 3pt
life_sat, # recoded - 3pt
econ_sat, # recoded - 3pt
immig_support, # recoded - 3pt
daily_netuse, # recoded - binary
country_attach, # recode - 3pt
post_polonline # was already a binary
)Now create a GB only file we use for our conditional means.
Now we calculate all of the means within groups - this is our actual bivariate analysis.
This loop runs through each column and returns the mean for each level, standard error and confidence intervals.
# setup survey design
simple_design <- gb_clean |>
as_survey_design(ids = 1, weights = anweight)
# create list of our demographic variables
demog_vars <- gb_clean |> select(gender:post_polonline) |> names()
demog_vars## [1] "gender" "age_rec" "educ_level"
## [4] "income_group" "income_stress" "ethnic_minority"
## [7] "area_type" "region" "health_disability"
## [10] "left_right" "polintr_binary" "life_sat"
## [13] "econ_sat" "immig_support" "daily_netuse"
## [16] "country_attach" "post_polonline"
Create summary of all bivariate averages for EU trust within the UK:
# loop to create
all_trust_eu_summary <-
map_dfr(demog_vars, function(dv) {
# 1) build a one‐sided formula for svyby: ~gender, ~age_rec, etc.
by_fml <- as.formula(paste0("~", dv))
# 2) call svyby() for trust_europ
tmp <- svyby(
formula = ~trust_europ,
by = by_fml,
design = simple_design,
FUN = svymean,
vartype = c("ci", "se"),
ci.level = 0.95,
na.rm = TRUE)
# 3) Tidy column names & capture the “level” for this dv
tmp |>
rename(mean = trust_europ,
ci_low = ci_l,
ci_high = ci_u
) |>
mutate(
demog_var = dv,
level = as.character(.data[[dv]])
) |>
select(demog_var, level, mean, se, ci_low, ci_high)
}) |>
filter(demog_var != "region")
print(all_trust_eu_summary)## demog_var
## Male gender
## Female gender
## <20 age_rec
## 20-29 age_rec
## 30-44 age_rec
## 45-64 age_rec
## >=65 age_rec
## Lower secondary educ_level
## Upper secondary educ_level
## Tertiary educ_level
## High (Decile 8+) income_group
## Low (Decile <4) income_group
## Moderate (Decile 4-7) income_group
## Living comfortably on present income income_stress
## Coping on present income income_stress
## Difficult on present income income_stress
## Very difficult on present income income_stress
## Yes...18 ethnic_minority
## No...19 ethnic_minority
## A big city area_type
## Suburbs or outskirts of big city area_type
## Town or small city area_type
## Country area_type
## No...24 health_disability
## Yes...25 health_disability
## Left (0-3) left_right
## Moderate (4-6) left_right
## Right (7-10) left_right
## No...29 polintr_binary
## Yes...30 polintr_binary
## High...31 life_sat
## Moderate...32 life_sat
## Low...33 life_sat
## High...34 econ_sat
## Moderate...35 econ_sat
## Low...36 econ_sat
## High...37 immig_support
## Moderate...38 immig_support
## Low...39 immig_support
## No...40 daily_netuse
## Yes...41 daily_netuse
## High...42 country_attach
## Moderate...43 country_attach
## Low...44 country_attach
## No...45 post_polonline
## Yes...46 post_polonline
## level
## Male Male
## Female Female
## <20 <20
## 20-29 20-29
## 30-44 30-44
## 45-64 45-64
## >=65 >=65
## Lower secondary Lower secondary
## Upper secondary Upper secondary
## Tertiary Tertiary
## High (Decile 8+) High (Decile 8+)
## Low (Decile <4) Low (Decile <4)
## Moderate (Decile 4-7) Moderate (Decile 4-7)
## Living comfortably on present income Living comfortably on present income
## Coping on present income Coping on present income
## Difficult on present income Difficult on present income
## Very difficult on present income Very difficult on present income
## Yes...18 Yes
## No...19 No
## A big city A big city
## Suburbs or outskirts of big city Suburbs or outskirts of big city
## Town or small city Town or small city
## Country Country
## No...24 No
## Yes...25 Yes
## Left (0-3) Left (0-3)
## Moderate (4-6) Moderate (4-6)
## Right (7-10) Right (7-10)
## No...29 No
## Yes...30 Yes
## High...31 High
## Moderate...32 Moderate
## Low...33 Low
## High...34 High
## Moderate...35 Moderate
## Low...36 Low
## High...37 High
## Moderate...38 Moderate
## Low...39 Low
## No...40 No
## Yes...41 Yes
## High...42 High
## Moderate...43 Moderate
## Low...44 Low
## No...45 No
## Yes...46 Yes
## mean se ci_low ci_high
## Male 3.661832 0.04535255 3.572943 3.750721
## Female 3.751493 0.03899138 3.675071 3.827915
## <20 5.452820 0.13041887 5.197203 5.708436
## 20-29 4.391354 0.08187294 4.230886 4.551822
## 30-44 4.111989 0.05849510 3.997341 4.226637
## 45-64 3.328544 0.05121673 3.228161 3.428927
## >=65 2.901149 0.05030493 2.802553 2.999745
## Lower secondary 3.250656 0.05718916 3.138568 3.362745
## Upper secondary 3.471924 0.06224368 3.349929 3.593919
## Tertiary 4.139420 0.04271285 4.055704 4.223135
## High (Decile 8+) 4.119414 0.05781470 4.006099 4.232728
## Low (Decile <4) 3.330410 0.06072259 3.211396 3.449424
## Moderate (Decile 4-7) 3.620040 0.05179004 3.518533 3.721546
## Living comfortably on present income 3.944316 0.04519895 3.855728 4.032904
## Coping on present income 3.582840 0.04567689 3.493315 3.672365
## Difficult on present income 3.375251 0.09108968 3.196718 3.553783
## Very difficult on present income 2.904656 0.16215237 2.586844 3.222469
## Yes...18 4.362149 0.11011247 4.146332 4.577965
## No...19 3.445696 0.03402885 3.379001 3.512392
## A big city 4.401590 0.09667298 4.212115 4.591066
## Suburbs or outskirts of big city 3.725615 0.06588841 3.596476 3.854754
## Town or small city 3.653724 0.04314942 3.569153 3.738295
## Country 3.424655 0.06001993 3.307019 3.542292
## No...24 3.928306 0.03464677 3.860399 3.996212
## Yes...25 3.096893 0.05695817 2.985257 3.208529
## Left (0-3) 4.429260 0.06673576 4.298461 4.560060
## Moderate (4-6) 3.632339 0.03891543 3.556066 3.708612
## Right (7-10) 3.291308 0.07362953 3.146997 3.435619
## No...29 3.361141 0.04646298 3.270076 3.452207
## Yes...30 3.944456 0.03867287 3.868659 4.020254
## High...31 3.900457 0.03501354 3.831832 3.969082
## Moderate...32 3.431863 0.06241805 3.309526 3.554200
## Low...33 2.316322 0.10934559 2.102009 2.530635
## High...34 4.442840 0.07100366 4.303675 4.582005
## Moderate...35 3.910122 0.04097903 3.829804 3.990439
## Low...36 3.091451 0.05229155 2.988962 3.193941
## High...37 4.797677 0.04447049 4.710517 4.884838
## Moderate...38 3.418175 0.04335615 3.333199 3.503152
## Low...39 1.970941 0.05287556 1.867307 2.074575
## No...40 3.081256 0.07379177 2.936627 3.225885
## Yes...41 4.092642 0.04320001 4.007972 4.177313
## High...42 3.833288 0.04733762 3.740508 3.926068
## Moderate...43 4.082375 0.07246415 3.940348 4.224402
## Low...44 3.515685 0.12050989 3.279490 3.751880
## No...45 3.674834 0.04258062 3.591377 3.758290
## Yes...46 4.339433 0.07795974 4.186634 4.492231
Calculate overall average for plotting comparison point.
# calculate averages across all waves to get an idea and standardise later:
overall_eu_trust <- gb_clean |>
as_survey_design(ids = 1, weights = anweight) |>
filter(!is.na(trust_europ)) |>
summarise(mean_trust = survey_mean(trust_europ, vartype = "ci"))
print(overall_eu_trust)## # A tibble: 1 × 3
## mean_trust mean_trust_low mean_trust_upp
## <dbl> <dbl> <dbl>
## 1 3.71 3.65 3.77
## [1] 3.706561
Now plot with the confidence intervals
all_trust_eu_summary |>
ggplot(aes(x = mean,
y = fct_reorder(paste0(demog_var, ": ", level), mean))) +
geom_point(size = 2, colour = "steelblue") +
geom_errorbarh(aes(xmin = ci_low, xmax = ci_high),
height = 0, colour = "gray50") +
# facet_grid(demog_var ~ ., scales = "free_y", space = "free_y") +
labs(
x = "Weighted mean of Trust in EU Parliament",
y = NULL,
title = "Conditional Mean of Trust in EU Parliament, R6-R11, GB only",
subtitle = "Point estimates ± 95% CI"
) +
theme_minimal() +
theme(
strip.text.y = element_text(angle = 0, face = "bold"),
axis.text.y = element_text(size = 10),
panel.spacing.y = unit(0.5, "lines")
)+
geom_vline(xintercept = overall_avg)Now grouped by the actual demographic variables too:
all_trust_eu_summary |>
ggplot(aes(x = mean,
y = fct_reorder(level, mean))) +
geom_point(size = 2, colour = "steelblue") +
geom_errorbarh(aes(xmin = ci_low, xmax = ci_high),
height = 0, colour = "gray50") +
facet_grid(demog_var ~ ., scales = "free_y", space = "free_y") +
labs(
x = "Weighted mean of Trust in EU Parliament",
y = NULL,
title = "Conditional Mean of Trust in EU Parliament, R6 to R11, GB only",
subtitle = "Point estimates ± 95% CI"
) +
theme_minimal() +
theme(
strip.text.y = element_text(angle = 0, face = "bold"),
axis.text.y = element_text(size = 10),
panel.spacing.y = unit(0.5, "lines")
) +
geom_vline(xintercept = overall_avg)+
scale_y_discrete(expand = expansion(add = c(1, 1)),
position = "left")We calculate the standardised data. We exclude region as they are not granular enough and the data appears to disagree with estimates of the Brexit vote, where in this data, London has the highest levels of EU trust.
# ensure you calculated the overall average in the data above.
standardized_trust_eu <- all_trust_eu_summary |>
mutate(std_mean = mean - overall_avg,
std_ci_low = ci_low - overall_avg,
std_ci_high = ci_high - overall_avg) |>
filter(demog_var != "region")And plot the standardised results
ggplot(standardized_trust_eu,
aes(x = std_mean,
y = fct_reorder(paste0(demog_var, ": ", level), mean))) +
geom_point(size = 2, colour = "steelblue") +
geom_errorbarh(aes(xmin = std_ci_low, xmax = std_ci_high),
height = 0, colour = "gray50") +
facet_grid(demog_var ~ ., scales = "free_y", space = "free_y") +
labs(
x = "Standardised mean of Trust in EU Parliament",
y = NULL,
title = "Standardised Mean of Trust in EU by Demographic Groups, R6 to R11, GB only",
subtitle = "Point estimates ± 95% CI"
) +
theme_minimal() +
theme(
strip.text.y = element_text(angle = 0, face = "bold"),
axis.text.y = element_text(size = 10),
panel.spacing.y = unit(0.5, "lines")
)+
geom_vline(xintercept = 0)Repeat the process for trust in local parliament.
# setup survey design
simple_design <- gb_clean |>
as_survey_design(ids = 1, weights = anweight)
# create list of our demographic variables
demog_vars <- gb_clean |> select(gender:post_polonline) |> names()
demog_vars## [1] "gender" "age_rec" "educ_level"
## [4] "income_group" "income_stress" "ethnic_minority"
## [7] "area_type" "region" "health_disability"
## [10] "left_right" "polintr_binary" "life_sat"
## [13] "econ_sat" "immig_support" "daily_netuse"
## [16] "country_attach" "post_polonline"
# local parliament loop through each demogrpahic then add on
all_trust_local_summary <-
map_dfr(demog_vars, function(dv) {
# 1) build a one‐sided formula for svyby: ~gender, ~age_group, etc.
by_fml <- as.formula(paste0("~", as_factor(dv)))
# 2) call svyby() for trust_europ
tmp <- svyby(
formula = ~trust_parliament,
by = by_fml,
design = simple_design,
FUN = svymean,
vartype = c("ci", "se"),
ci.level = 0.95,
na.rm = TRUE
)
# 3) Tidy column names & capture the “level” for this dv
tmp |>
rename(
mean = trust_parliament,
ci_low = ci_l,
ci_high = ci_u) |>
mutate(
demog_var = dv,
level = as.character(.data[[dv]])
) |>
select(demog_var, level, mean, se, ci_low, ci_high)
}) |> filter(demog_var != "region")
print(all_trust_local_summary)## demog_var
## Male gender
## Female gender
## <20 age_rec
## 20-29 age_rec
## 30-44 age_rec
## 45-64 age_rec
## >=65 age_rec
## Lower secondary educ_level
## Upper secondary educ_level
## Tertiary educ_level
## High (Decile 8+) income_group
## Low (Decile <4) income_group
## Moderate (Decile 4-7) income_group
## Living comfortably on present income income_stress
## Coping on present income income_stress
## Difficult on present income income_stress
## Very difficult on present income income_stress
## Yes...18 ethnic_minority
## No...19 ethnic_minority
## A big city area_type
## Suburbs or outskirts of big city area_type
## Town or small city area_type
## Country area_type
## No...24 health_disability
## Yes...25 health_disability
## Left (0-3) left_right
## Moderate (4-6) left_right
## Right (7-10) left_right
## No...29 polintr_binary
## Yes...30 polintr_binary
## High...31 life_sat
## Moderate...32 life_sat
## Low...33 life_sat
## High...34 econ_sat
## Moderate...35 econ_sat
## Low...36 econ_sat
## High...37 immig_support
## Moderate...38 immig_support
## Low...39 immig_support
## No...40 daily_netuse
## Yes...41 daily_netuse
## High...42 country_attach
## Moderate...43 country_attach
## Low...44 country_attach
## No...45 post_polonline
## Yes...46 post_polonline
## level
## Male Male
## Female Female
## <20 <20
## 20-29 20-29
## 30-44 30-44
## 45-64 45-64
## >=65 >=65
## Lower secondary Lower secondary
## Upper secondary Upper secondary
## Tertiary Tertiary
## High (Decile 8+) High (Decile 8+)
## Low (Decile <4) Low (Decile <4)
## Moderate (Decile 4-7) Moderate (Decile 4-7)
## Living comfortably on present income Living comfortably on present income
## Coping on present income Coping on present income
## Difficult on present income Difficult on present income
## Very difficult on present income Very difficult on present income
## Yes...18 Yes
## No...19 No
## A big city A big city
## Suburbs or outskirts of big city Suburbs or outskirts of big city
## Town or small city Town or small city
## Country Country
## No...24 No
## Yes...25 Yes
## Left (0-3) Left (0-3)
## Moderate (4-6) Moderate (4-6)
## Right (7-10) Right (7-10)
## No...29 No
## Yes...30 Yes
## High...31 High
## Moderate...32 Moderate
## Low...33 Low
## High...34 High
## Moderate...35 Moderate
## Low...36 Low
## High...37 High
## Moderate...38 Moderate
## Low...39 Low
## No...40 No
## Yes...41 Yes
## High...42 High
## Moderate...43 Moderate
## Low...44 Low
## No...45 No
## Yes...46 Yes
## mean se ci_low ci_high
## Male 4.428510 0.04244543 4.345319 4.511702
## Female 4.153705 0.03678847 4.081601 4.225809
## <20 5.151996 0.13142577 4.894406 5.409586
## 20-29 4.219293 0.08240028 4.057792 4.380795
## 30-44 4.319867 0.05876435 4.204691 4.435043
## 45-64 4.128398 0.04934188 4.031690 4.225106
## >=65 4.280463 0.04816351 4.186065 4.374862
## Lower secondary 3.977248 0.05194717 3.875433 4.079063
## Upper secondary 4.069206 0.05943507 3.952716 4.185697
## Tertiary 4.604417 0.04101823 4.524023 4.684811
## High (Decile 8+) 4.706492 0.05528409 4.598138 4.814847
## Low (Decile <4) 3.906961 0.05587625 3.797445 4.016476
## Moderate (Decile 4-7) 4.290048 0.05031201 4.191438 4.388658
## Living comfortably on present income 4.720359 0.04235267 4.637349 4.803369
## Coping on present income 4.100594 0.04279683 4.016713 4.184474
## Difficult on present income 3.634803 0.08242277 3.473258 3.796349
## Very difficult on present income 3.242067 0.15784731 2.932692 3.551442
## Yes...18 5.016962 0.11124087 4.798934 5.234990
## No...19 4.368119 0.03230064 4.304811 4.431427
## A big city 4.437521 0.09062049 4.259908 4.615134
## Suburbs or outskirts of big city 4.384884 0.06324464 4.260926 4.508841
## Town or small city 4.205546 0.04102390 4.125140 4.285951
## Country 4.286621 0.05559105 4.177665 4.395578
## No...24 4.492962 0.03299777 4.428288 4.557636
## Yes...25 3.733874 0.05189767 3.632156 3.835592
## Left (0-3) 3.774719 0.06351516 3.650232 3.899206
## Moderate (4-6) 4.279230 0.03706646 4.206581 4.351879
## Right (7-10) 5.277730 0.06750932 5.145414 5.410045
## No...29 3.863933 0.04402526 3.777645 3.950221
## Yes...30 4.601954 0.03596849 4.531457 4.672451
## High...31 4.617455 0.03265577 4.553451 4.681459
## Moderate...32 3.672681 0.05482073 3.565234 3.780128
## Low...33 2.464715 0.10095329 2.266850 2.662580
## High...34 6.029402 0.06079131 5.910253 6.148551
## Moderate...35 4.681446 0.03557509 4.611720 4.751172
## Low...36 2.957243 0.04392285 2.871156 3.043330
## High...37 4.792249 0.04661624 4.700883 4.883615
## Moderate...38 4.324764 0.04184636 4.242746 4.406781
## Low...39 3.163536 0.05661984 3.052563 3.274509
## No...40 4.073239 0.06656520 3.942773 4.203704
## Yes...41 4.321072 0.04206053 4.238635 4.403509
## High...42 4.716166 0.04312436 4.631644 4.800688
## Moderate...43 4.000235 0.06896866 3.865059 4.135412
## Low...44 2.640777 0.09694673 2.450765 2.830789
## No...45 4.311424 0.03938434 4.234233 4.388616
## Yes...46 4.123531 0.07916251 3.968376 4.278687
Calculate the average for gb trust:
# calculate averages across all waves to get an idea and standardise later:
overall_gb_trust <- gb_clean |>
as_survey_design(ids = 1, weights = anweight) |>
filter(!is.na(trust_parliament)) |>
summarise(mean_trust = survey_mean(trust_parliament, vartype = "ci"))
print(overall_gb_trust)## # A tibble: 1 × 3
## mean_trust mean_trust_low mean_trust_upp
## <dbl> <dbl> <dbl>
## 1 4.29 4.23 4.34
# pull the avg for standardising.
overall_avg_local <- overall_gb_trust$mean_trust[1]
overall_avg_local## [1] 4.288857
Plot the values for local parliament:
ggplot(all_trust_local_summary,
aes(x = mean,
y = fct_reorder(paste0(demog_var, ": ", level), mean))) +
geom_point(size = 2, colour = "steelblue") +
geom_errorbarh(aes(xmin = ci_low, xmax = ci_high),
height = 0, colour = "gray50") +
# facet_grid(demog_var ~ ., scales = "free_y", space = "free_y") +
labs(
x = "Weighted mean of Trust in Parliament",
y = NULL,
title = "Conditional Mean of Trust in UK Parliament by Demographic Groups",
subtitle = "Point estimates ± 95% CI"
) +
theme_minimal() +
theme(
strip.text.y = element_text(angle = 0, face = "bold"),
axis.text.y = element_text(size = 10),
panel.spacing.y = unit(0.5, "lines")
)+
geom_vline(xintercept = overall_avg_local)And unordered data, grouped by variable:
ggplot(all_trust_local_summary,
aes(x = mean,
y = fct_reorder(level, mean))) +
geom_point(size = 2, colour = "steelblue") +
geom_errorbarh(aes(xmin = ci_low, xmax = ci_high),
height = 0, colour = "gray50") +
facet_grid(demog_var~., scales = "free_y", space = "free_y") +
labs(
x = "Weighted mean of Trust in Parliament",
y = NULL,
title = "Conditional Mean of Trust in UK Parliament by Demographic Groups",
subtitle = "Point estimates ± 95% CI"
) +
theme_minimal() +
theme(
strip.text.y = element_text(angle = 0, face = "bold"),
axis.text.y = element_text(size = 10),
panel.spacing.y = unit(0.5, "lines")
)+
geom_vline(xintercept = overall_avg_local)Combine data for euro and local parliament trust.
#join
overall_demog_trust <- left_join(
all_trust_eu_summary, all_trust_local_summary,
by = c("demog_var", "level"),
suffix = c(".eu", ".uk")
)
overall_demog_trustoverall_demog_trust |>
ggplot(aes(x = mean.eu,
y = mean.uk))+
geom_point(size = 2)+
theme_minimal()+
labs(title = "Avereage trust in EU vs UK parliament by demographic group",
x = "Average EU Parliament trust (/10)",
y = "Average UK Parliament trust (/10)")## [1] 0.6460012
We see a strong positive correlation. This trend is biased as it’s all trained on the same data and not independent groups. But it does show that the demographic groups generally have positive correlations with trust in UK and EU. This supports the findings of (Harteveld et al. 2013) that found EU trust is driven by local government trust.
Plot with negative values for UK to show the group’s values next to each other:
# Create a modified dataset where UK values are negative (for left alignment)
plot_data <- overall_demog_trust |>
mutate(neg.mean.uk = -mean.uk,
std.uk = mean.uk-overall_avg_local,
std.uk_cilow = ci_low.uk-overall_avg_local,
std.uk_cihigh = ci_high.uk-overall_avg_local,
std.eu = mean.eu - overall_avg, # Negate UK values so they appear on the left
std.eu_cilow = ci_low.eu-overall_avg,
std.eu_cihigh = ci_high.eu-overall_avg)# Plot
ggplot(plot_data) +
# plot eu values
geom_point(aes(x = mean.eu, y = fct_reorder(level, mean.eu)),
color = "steelblue", size = 2) + # EU
geom_errorbarh(aes(y = level, xmin = ci_low.eu, xmax = ci_high.eu),
color = "gray50", height = 0) +
# plot gb values
geom_point(aes(x = mean.uk, y = fct_reorder(level, mean.uk)),
color = "orange", size = 2) + # UK
geom_errorbarh(aes(y = level, xmin = ci_high.uk, xmax = ci_low.uk),
color = "gray50", height = 0) +
labs(
x = "Weighted Mean of Trust",
y = NULL,
title = "Trust in EU vs UK Parliament by Demographic Group",
subtitle = "EU = Blue; UK = Orange"
) +
theme_minimal() +
facet_grid(demog_var ~ ., scales = "free_y", space = "free_y",
switch = "y") + # For left-side placement of "strip" label
theme(
strip.text.y.left = element_text(angle = 0, face = "bold", hjust = 0),
strip.placement = "outside",
axis.text.y = element_text(size = 10, hjust = 1),
panel.spacing.y = unit(0.5, "lines")
) +
scale_y_discrete(expand = expansion(add = c(1, 1)),
position = "left")Standardised trends for trust in UK and EU Parliament by demographic gorup
# Plot with standardised data to show around the same zero value
ggplot(plot_data) +
geom_point(aes(x = std.eu, y = fct_reorder(interaction(demog_var, level), std.eu)),
color = "steelblue", size = 2) + # EU
geom_errorbarh(aes(y = interaction(demog_var, level),
xmin = std.eu_cilow, xmax = std.eu_cihigh),
color = "gray50", height = 0) +
geom_point(aes(x = std.uk, y = fct_reorder(interaction(demog_var,level), std.uk)),
color = "orange", size = 2) + # UK
geom_errorbarh(aes(y = interaction(demog_var, level),
xmin = std.uk_cilow, xmax = std.uk_cihigh),
color = "gray50", height = 0) +
# geom_vline(xintercept = , linetype = "dashed", color = "black") + # Middle reference line
labs(
x = "Weighted Mean of Trust in UK and EU Parliament, UK R5-8",
y = NULL,
title = "Standardised trust in EU vs UK Parliament",
subtitle = "Blue = EU; Orange = UK. CI ± 95%"
) +
theme_minimal()+
scale_y_discrete(expand = expansion(add = c(1, 1))) # Adds space before/after categoriesPlot standardised:
# Plot
ggplot(plot_data) +
# plot eu values
geom_point(aes(x = std.eu, y = fct_reorder(level, std.eu)),
color = "steelblue", size = 2) + # EU
geom_errorbarh(aes(y = level, xmin = std.eu_cilow, xmax = std.eu_cihigh),
color = "gray50", height = 0) +
# plot gb values
geom_point(aes(x = std.uk, y = level),
color = "orange", size = 2) + # UK
geom_errorbarh(aes(y = level, xmin = std.uk_cilow, xmax = std.uk_cihigh),
color = "gray50", height = 0) +
labs(
x = "Below average trust in EU/UK <<< | >>> Above average trust in EU/UK ",
y = NULL,
title = "Standardised trust in EU vs UK Parliament by Demographic Group",
subtitle = "EU = Blue; UK = Orange"
) +
theme_minimal() +
facet_grid(demog_var ~ ., scales = "free_y", space = "free_y",
switch = "y") + # For left-side placement of "strip" label
theme(
strip.text.y.left = element_text(angle = 0, face = "bold", hjust = 0),
strip.placement = "outside",
axis.text.y = element_text(size = 10, hjust = 1),
panel.spacing.y = unit(0.5, "lines")
) +
scale_y_discrete(expand = expansion(add = c(1, 1)),
position = "left") +
geom_vline(xintercept = 0)Plot a subset for report:
# Plot subset for smaller report chart
plot_data |> filter(demog_var %in% c("age_rec", "country_attach", "econ_sat", "immig_support", "left_right")) |>
mutate(
demog_var = case_when(
demog_var == "age_rec" ~ "Age",
demog_var == "country_attach" ~ "UK Attachment",
demog_var == "econ_sat" ~ "Economic Satisfaction",
demog_var == "immig_support" ~ "Immigration Support",
demog_var == "left_right" ~ "L-R Political Scale",
TRUE ~ demog_var # This ensures any other values (if they existed and were not filtered) remain unchanged
)) |>
ggplot() +
# plot eu values
geom_point(aes(x = std.eu, y = fct_reorder(level, std.eu)),
color = "steelblue", size = 2) + # EU
geom_errorbarh(aes(y = level, xmin = std.eu_cilow, xmax = std.eu_cihigh),
color = "gray50", height = 0) +
# plot gb values
geom_point(aes(x = std.uk, y = level),
color = "orange", size = 2) + # UK
geom_errorbarh(aes(y = level, xmin = std.uk_cilow, xmax = std.uk_cihigh),
color = "gray50", height = 0) +
labs(
x = "Below average trust in EU/UK <<< | >>> Above average trust in EU/UK ",
y = NULL,
title = "Standardised trust in EU vs UK Parliament by Demographic Group",
subtitle = "EU = Blue; UK = Orange"
) +
theme_minimal() +
facet_grid(demog_var ~ ., scales = "free_y", space = "free_y",
switch = "y") + # For left-side placement of "strip" label
theme(
strip.text.y.left = element_text(angle = 0, face = "bold", hjust = 0),
strip.placement = "outside",
axis.text.y = element_text(size = 10, hjust = 1),
panel.spacing.y = unit(0.5, "lines")
) +
scale_y_discrete(expand = expansion(add = c(1, 1)),
position = "left") +
geom_vline(xintercept = 0)Plot as table using stargazer
# summary_plot_data <- plot_data |>
# select(demog_var, level, mean.eu, se.eu, mean.uk, se.uk)
#
# stargazer(summary_plot_data,
# type = "text", # Use "latex" or "html" for different formats
# summary = FALSE, # Prevent summary statistics
# rownames = FALSE, # Remove row numbers
# title = "Summary Table",
# column.labels = c("Demographic Variable", "Level", "Mean EU", "SE EU", "Mean UK", "SE UK"),
# align = TRUE)